When robots learn reward functions using high capacity models that take raw state directly as input, they need to both learn a representation for what matters in the task -- the task ``features" -- as well as how to combine these features into a single objective. If they try to do both at once from input designed to teach the full reward function, it is easy to end up with a representation that contains spurious correlations in the data, which fails to generalize to new settings. Instead, our ultimate goal is to enable robots to identify and isolate the causal features that people actually care about and use when they represent states and behavior. Our idea is that we can tune into this representation by asking users what behaviors they consider similar: behaviors will be similar if the features that matter are similar, even if low-level behavior is different; conversely, behaviors will be different if even one of the features that matter differs. This, in turn, is what enables the robot to disambiguate between what needs to go into the representation versus what is spurious, as well as what aspects of behavior can be compressed together versus not. The notion of learning representations based on similarity has a nice parallel in contrastive learning, a self-supervised representation learning technique that maps visually similar data points to similar embeddings, where similarity is defined by a designer through data augmentation heuristics. By contrast, in order to learn the representations that people use, so we can learn their preferences and objectives, we use their definition of similarity. In simulation as well as in a user study, we show that learning through such similarity queries leads to representations that, while far from perfect, are indeed more generalizable than self-supervised and task-input alternatives.
translated by 谷歌翻译
One of the most successful paradigms for reward learning uses human feedback in the form of comparisons. Although these methods hold promise, human comparison labeling is expensive and time consuming, constituting a major bottleneck to their broader applicability. Our insight is that we can greatly improve how effectively human time is used in these approaches by batching comparisons together, rather than having the human label each comparison individually. To do so, we leverage data dimensionality-reduction and visualization techniques to provide the human with a interactive GUI displaying the state space, in which the user can label subportions of the state space. Across some simple Mujoco tasks, we show that this high-level approach holds promise and is able to greatly increase the performance of the resulting agents, provided the same amount of human labeling time.
translated by 谷歌翻译
我们的目标是使机器人能够以情感方式执行功能任务,无论是响应用户的情绪状态还是表达其信心水平。先前的工作已经提出了从用户反馈中每个目标情绪的学习独立成本功能,以便机器人可以在遇到的任何情况下将其与任务和环境特定目标一起优化。但是,在建模多种情绪并且无法推广到新的情绪时,这种方法效率低下。在这项工作中,我们利用了一个事实,即情绪并非彼此独立:它们是通过价值占主导地位的潜在空间(VAD)相关的。我们的关键想法是学习一个模型,以使用用户标签映射到VAD上。考虑到轨迹的映射和目标VAD之间的距离,可以使该单个模型代表所有情绪的成本功能。结果1)所有用户反馈都可以促进学习每一个情绪; 2)机器人可以为空间中的任何情感生成轨迹,而不仅仅是少数预定义的轨迹; 3)机器人可以通过将其映射到目标VAD来对用户生成的自然语言进行情感响应。我们介绍了一种交互式学习将轨迹映射到该潜在空间并在模拟和用户研究中对其进行测试的方法。在实验中,我们使用一个简单的真空机器人以及Cassie Biped。
translated by 谷歌翻译
机器人需要能够从用户学习概念,以便将其功能调整到每个用户的唯一任务。但是当机器人在高维输入上运行时,如图像或点云,这是不切实际的:机器人需要一种不切实际的人类努力来学习新概念。为了解决这一挑战,我们提出了一种新方法,其中机器人学习概念的低维变体,并使用它来生成更大的数据集,用于在高维空间中学习概念。这使得只有在训练时间等地访问的语义上有意义的特权信息,如对象姿势和边界框,这允许更丰富的人类交互来加速学习。我们通过学习介词概念来评估我们的方法,这些概念描述了对象状态或多对象关系,如上面,近,近或对齐,这是用户规范任务目标和机器人的执行约束的关键。使用模拟人类,我们表明,与直接在高维空间中的学习概念相比,我们的方法可以提高样本复杂性。我们还展示了学习概念在7 DOF法兰卡熊猫机器人上的运动规划任务中的效用。
translated by 谷歌翻译
Neural algorithmic reasoning studies the problem of learning algorithms with neural networks, especially with graph architectures. A recent proposal, XLVIN, reaps the benefits of using a graph neural network that simulates the value iteration algorithm in deep reinforcement learning agents. It allows model-free planning without access to privileged information about the environment, which is usually unavailable. However, XLVIN only supports discrete action spaces, and is hence nontrivially applicable to most tasks of real-world interest. We expand XLVIN to continuous action spaces by discretization, and evaluate several selective expansion policies to deal with the large planning graphs. Our proposal, CNAP, demonstrates how neural algorithmic reasoning can make a measurable impact in higher-dimensional continuous control settings, such as MuJoCo, bringing gains in low-data settings and outperforming model-free baselines.
translated by 谷歌翻译
Deploying graph neural networks (GNNs) on whole-graph classification or regression tasks is known to be challenging: it often requires computing node features that are mindful of both local interactions in their neighbourhood and the global context of the graph structure. GNN architectures that navigate this space need to avoid pathological behaviours, such as bottlenecks and oversquashing, while ideally having linear time and space complexity requirements. In this work, we propose an elegant approach based on propagating information over expander graphs. We leverage an efficient method for constructing expander graphs of a given size, and use this insight to propose the EGP model. We show that EGP is able to address all of the above concerns, while requiring minimal effort to set up, and provide evidence of its empirical utility on relevant graph classification datasets and baselines in the Open Graph Benchmark. Importantly, using expander graphs as a template for message passing necessarily gives rise to negative curvature. While this appears to be counterintuitive in light of recent related work on oversquashing, we theoretically demonstrate that negatively curved edges are likely to be required to obtain scalable message passing without bottlenecks. To the best of our knowledge, this is a previously unstudied result in the context of graph representation learning, and we believe our analysis paves the way to a novel class of scalable methods to counter oversquashing in GNNs.
translated by 谷歌翻译
神经算法推理的基石是解决算法任务的能力,尤其是以一种概括分布的方式。尽管近年来,该领域的方法学改进激增,但它们主要集中在建立专家模型上。专业模型能够学习仅执行一种算法或具有相同控制流骨干的算法的集合。相反,在这里,我们专注于构建通才神经算法学习者 - 单个图形神经网络处理器,能够学习执行各种算法,例如分类,搜索,动态编程,路径触发和几何学。我们利用CLRS基准来凭经验表明,就像在感知领域的最新成功一样,通才算法学习者可以通过“合并”知识来构建。也就是说,只要我们能够在单任务制度中学习很好地执行它们,就可以以多任务的方式有效地学习算法。在此激励的基础上,我们为CLR提供了一系列改进,对CLR的输入表示,培训制度和处理器体系结构,将平均单任务性能提高了20%以上。然后,我们进行了多任务学习者的彻底消融,以利用这些改进。我们的结果表明,一位通才学习者有效地结合了专家模型所捕获的知识。
translated by 谷歌翻译
复制自然人类运动是机器人控制理论的长期目标。从生物学中汲取灵感,到达控制网络会产生平稳而精确的运动,可以缩小人类和机器人控制之间的性能差距。模仿大脑的计算原理的神经形态处理器是近似此类控制器的准确性和平滑性的理想平台,同时最大程度地提高了能源效率和鲁棒性。但是,常规控制方法与神经形态硬件的不兼容限制了其现有适应性的计算效率和解释。相比之下,平滑而准确的运动运动的基础神经元连接组有效,最小,并且与神经形态处理器固有兼容。在这项工作中,我们模拟了这些网络,并提出了一个具有生物学现实的尖峰神经网络,以进行运动控制。我们的控制器结合了自适应反馈,以提供平稳而准确的电动机控制,同时继承了其生物学对应物的最小复杂性,该生物学对应物控制到达运动,从而可以在英特尔的神经形态处理器上进行直接部署。我们使用我们的控制器作为构建块,并受到人类武器中联合协调的启发,我们扩大了控制现实世界机器人武器的方法。所得运动的轨迹和平滑,最小的速度曲线类似于人类的动作,从而验证了我们控制者的生物学相关性。值得注意的是,我们的方法实现了最新的控制性能,同时将运动混蛋减少19 \%以提高运动平滑度。我们的工作表明,利用大脑的计算单元及其连通性可能会导致设计有效,有效且可解释的神经形态控制器,从而为完全自主系统中的神经形态溶液铺平了道路。
translated by 谷歌翻译
自主代理需要自定位才能在未知环境中导航。他们可以使用视觉进程(VO)来估计自我运动并使用视觉传感器定位自己。作为惯性传感器或滑板作为轮编码器,这种运动估算策略不会因漂移而受到损害。但是,带有常规摄像机的VO在计算上是要求的,它限制了其在严格的低延迟, - 内存和 - 能量要求的系统中的应用。使用基于事件的相机和神经形态计算硬件为VO问题提供了有希望的低功率解决方案。但是,VO的常规算法不容易转换为神经形态硬件。在这项工作中,我们提出了一种完全由适合神经形态实现的神经元构件构建的VO算法。构建块是代表向量符号体系结构(VSA)计算框架中向量的神经元组,该框架是作为编程神经形态硬件的抽象层提出的。我们提出的VO网络生成并存储了对展示的视觉环境的工作记忆。它更新了此工作内存,同时估计相机的位置和方向的变化。我们证明了如何将VSA作为神经形态机器人技术的计算范式借用。此外,我们的结果代表了使用神经形态计算硬件进行快速和效率的VO以及同时定位和映射(SLAM)的相关任务的重要步骤。我们通过机器人任务和基于事件的数据集对实验进行了实验验证这种方法,并证明了最先进的性能。
translated by 谷歌翻译
在视觉场景理解中,推断对象的位置及其刚性转换仍然是一个开放的问题。在这里,我们提出了一种使用有效的分解网络的神经形态解决方案,该解决方案基于三个关键概念:(1)基于矢量符号体系结构(VSA)的计算框架,带有复杂值值矢量; (2)分层谐振器网络(HRN)的设计,以处理视觉场景中翻译和旋转的非交换性质,而两者都被组合使用; (3)设计多室尖峰拟态神经元模型,用于在神经形态硬件上实现复杂值的矢量结合。 VSA框架使用矢量结合操作来产生生成图像模型,其中绑定充当了几何变换的模棱两可的操作。因此,场景可以描述为向量产物的总和,从而可以通过谐振器网络有效地分解以推断对象及其姿势。 HRN启用了分区体系结构的定义,其中矢量绑定是一个分区内的水平和垂直翻译,以及另一个分区内的旋转和缩放的定义。尖峰神经元模型允许将谐振网络映射到有效且低功耗的神经形态硬件上。在这项工作中,我们使用由简单的2D形状组成的合成场景展示了我们的方法,经历了刚性的几何变换和颜色变化。同伴论文在现实世界的应用程序方案中为机器视觉和机器人技术展示了这种方法。
translated by 谷歌翻译